Linux Loadable Module
A Linux Loadable Module is a feature of the Linux Kernel that allows users to add features to the system at runtime without requireing a reboot. The term is a.k.a LKM, Linux Module, Loadable Module, Kernel Module, or simply Module. Overview A Linux Loadable Module is an ELF (Executable and Linkable Format) object file. This ELF file is given to the "insmod" system call. Each module designates two functions to be called at module loading (init) and removing (exit) respectively. In the init function the module performs any initialization it needs such as allocating memory, calling functions to register a device driver or a file system, or hook some kernel functions like system calls. The exit function perform the opposite of the init function and frees all allocated objects. Definitions Modules VS Applications Event Driven VS Sequential While most small and medium-sized applications perform a single task from beginning to end, every kernel module just registers itself in order to serve future requests, and its initialization function terminates immediately. It then stays resident in the kernel memory waiting for its functions to be called (event) Aggressive VS Lazy Resources Cleaning Whereas an application that terminates can be lazy in releasing resources or avoids clean up altogether, the exit function of a module must carefully undo everything the init function built up, or the pieces remain around until the system is rebooted. Kernel Symbol Table VS LibC User applications can use LibC while modules can only use symbols exported by the kernel or by other loaded modules. Oops VS Signals Whereas a segmentation fault (singal) is harmless during application development and a debugger can always be used to trace the error to the problem in the source code, a kernel fault kills the current process at least, if not the whole system, in what is called an Oops. User Modules VS Kernel Modules Advantages of user-space drivers The full C library can be linked in The programmer can run a conventional debuggers If a user-space driver hangs, you can simply kill it User memory is swappable, unlike kernel memory A well-designed driver program can still, like kernel-space drivers, allow concurrent access to a device. You are allowed to make closed-source You dont have to worry about kernel versions Disadvantages of user-space drivers Interrupts are not available in user space Direct access to memory is possible only by mmapping /dev/mem, and only a privileged user can do that Access to I/O ports is available only after calling ioperm or iopl. Not all platfomrs support this, and access to /dev/ports can be too slow Response time is slower, because a context switch is required f the driver has been swapped to disk, response time is unacceptably long. Using the mlock system call might help, but usually you’ll need to lock many memory pages, because a user-space program depends on a lot of library code. mlock, too, is limited to privileged users The most important devices can’t be handled in user space, including, but not limited to, network interfaces and block devices How-To Getting Started Step1: Setting Up The Environment First you need to install a Linux source tree. You can either obtain one from an online mirror or use the one from the application repository of your Linux distribution. If you are using YUM (Yellow Update Manager) try the following command, as root, to install all required module development files: yum -y install kernel-devel For the following steps, assume you have one under /usr/src/kernels/2.6.X/. Step2: Writing The Code A skleleton of a linux module // Filename: hello.c #include #include MODULE_LICENSE("Dual BSD/GPL"); static int hello_init(void) { // moule is now being loaded return 0; } static void hello_exit(void) { //module is now being removed } module_init(hello_init); module_exit(hello_exit); Step3: Compiling and Building To compile kernel modules in Linux 2.6 and later, you will use the kernel build process. The build process is a collection of scripts, objects, and make files that use your custom made make file to build your modules. Assume your module source file is called hello.c and include util1.c and util2.c Make file # Filename makefile located in the same directory as hello.c # If KERNELRELEASE is defined, we've been invoked from the # kernel build system and can use its language. ifneq ($(KERNELRELEASE),) obj-m := hello.o module-objs := util1.o util2.o # Otherwise we were called directly from the command # line; invoke the kernel build system. else KERNELDIR ?= /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KERNELDIR) M=$(PWD) modules endif To use this file, cd to its directory (same directory as hello.c) and execute the following command make Note that the name of the make file must be exactly "makefile". The result of compilation, among many other files, is the module file hello.ko The files found in the Documentation/kbuild directory in the kernel source are required reading for anybody wanting to understand all that is really going on beneath the surface. Step4: Loading and Unloading Load with this command (as root): insmod ./hello.ko Unload with this command (as root): rmmod hello Notice the missing path and file extension in the remove. Printing and Viewing Message from Kernel Space The printk function behaves similarly to the standard C library function printf. The place where the messages show up depends on the priority of the message, the kernel version you are running, the version of the klogd daemon, your configuration of syslogd, and the type of terminal you are logged at. The printk function writes messages into a circular buffer that is __LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is waiting for messages, that is, any process that is sleeping in the syslog system call or that is reading /proc/kmsg. These two interfaces to the logging engine are almost equivalent, but note that reading from /proc/kmsg consumes the data from the log buffer, whereas the syslog system call can optionally return log data while leaving it for other processes as well. In general, reading the /proc file is easier and is the default behavior for klogd. The dmesg command can be used to look at the content of the buffer without flushing it; actually, the command returns to stdout the whole content of the buffer, whether or not it has already been read. If the circular buffer fills up, printk wraps around and starts adding new data to the beginning of the buffer, overwriting the oldest data. Therefore, the logging process loses the oldest data. Priority printk lets you classify messages according to their severity by associating different loglevels. For example printk(KERN_INFO "this a message with kern_info severity"); The loglevel macro expands to a string, which is concatenated to the message text at compile time. There are eight possible loglevel strings, defined in the header here they are listed in order of decreasing severity KERN_EMERG: Used for emergency messages, usually those that precede a crash KERN_ALERT: A situation requiring immediate action KERN_CRIT: Critical conditions, often related to serious hardware or software failures KERN_ERR: Used to report error conditions; device drivers often use KERN_ERR to report hardware difficulties KERN_WARNING: Warnings about problematic situations that do not, in themselves, create serious problems with the system KERN_NOTICE: Situations that are normal, but still worthy of note. A number of security-related conditions are reported at this level KERN_INFO: Informational messages. Many drivers print information about the hardware they find at startup time at this level KERN_DEBUG: Used for debugging messages A printk statement with no specified priority defaults to DEFAULT_MESSAGE_LOGLEVEL, specified in kernel/printk.c as an integer. In the 2.6.10 kernel, DEFAULT_MESSAGE_LOGLEVEL is KERN_WARNING, but that has been known to change in the past. Terminal Type and Configuration If the priority is less than the integer variable console_loglevel, the message is delivered to the console one line at a time (nothing is sent unless a trailing newline is provided). The variable console_loglevel is initialized to DEFAULT_CONSOLE_LOGLEVEL and can be modified through the sys_syslog system call. One way to change it is by specifying the –c switch when invoking klogd, as specified in the klogd manpage. It is also possible to read and modify the console loglevel using the text file /proc/sys/kernel/printk. The file hosts four integer values: the current loglevel, the default level for messages that lack an explicit loglevel, the minimum allowed loglevel, and the boot-time default loglevel. Writing a single value to this file changes the current loglevel to that value; thus, for example, you can cause all kernel messages to appear at the console by simply enteringthe following code echo 8 > /proc/sys/kernel/printk Note that the kernel can only log messages to the console which can only be pointing at a virtual terminal. So, if you are logged in at a psuedo terminal (e.g. xterm) on an X server, you will not be able to see any messages. ' klogd and syslogd ' If both klogd and syslogd are running on the system, kernel messages are appended to /var/log/messages (or otherwise treated depending on your syslogd configuration), independent of console_loglevel. If klogd is not running, the message won’t reach user space unless you read /proc/kmsg (which is often most easily done with the dmesg command). When using klogd, you should remember that it doesn’t save consecutive identical lines; it only saves the first such line and, at a later time, the number of repetitions it received. If the klogd process is running, it retrieves kernel messages and dispatches them to syslogd, which in turn checks /etc/syslog.conf to find out how to deal with them. syslogd differentiates between messages according to a facility and a priority; allowable values for both the facility and the priority are defined in . Kernel mes- sages are logged by the LOG_KERN facility at a priority corresponding to the one used in printk (for example, LOG_ERR is used for KERN_ERR messages). If klogd isn’t running, data remains in the circular buffer until someone reads it or the buffer overflows. Access The Calling Process (Current process) Kernel code can refer to the current process by accessing the global item current, defined in , which yields a pointer to struct task_struct, defined by . Actually, current is not truly a global variable. The need to support SMP systems forced the kernel developers to develop a mechanism that finds the current process on the relevant CPU. This mechanism must also be fast, since references to current happen frequently. The result is an architecture-dependent mechanism that, usually, hides a pointer to the task_struct structure on the kernel stack. The details of the implementation remain hidden to other kernel subsystems though, and a device driver can just include and refer to the current process. task_struct Dividing Module Responsibility (Module Stacking) exporting symbols How modprobe Works Force Removal of Modules Load-time Configuration (Module Parameters) 'Handling Errors During Initialization' Using The goto Statement Using The Cleanup Function Checking Kernel Version Dynamically Annotating Modules Debugging Kernel Code Configurable Kernel Debuggin Options CONFIG_DEBUG_KERNEL CONFIG_DEBUG_SLAB CONFIG_DEBUG_PAGEALLOC CONFIG_DEBUG_SPINLOCK CONFIG_DEBUG_SPINLOCK_SLEEP CONFIG_INIT_DEBUG CONFIG_DEBUG_INFO CONFIG_MAGIC_SYSRQ CONFIG_DEBUG_STACKOVERFLOW and CONFIG_DEBUG_STACK_USAGE CONFIG_KALLSYMS CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC CONFIG_ACPI_DEBUG CONFIG_DEBUG_DRIVER CONFIG_SCSI_CONSTANTS CONFIG_INPUT_EVBUG CONFIG_PROFILING The strace Command Other Debugging Solutions gdb kdb kgdp User Mode Linux (UML) Linux Trace Toolkit (LTT) Dynamic Probes (DProbes) Pitfalls Mechanism and Policy The double-underscore function prefix ( _ _ ) The Invalid Module Format Error (Version Dependency) Unnecessary Initialization Memory Unnecessary Exit Memory Prevent Allocating Exit Functions In Kernels That Disallow Unloading Module Loading Races Limited Stack The kernel has a very small stack; it can be as small as a single, 4096-byte page. Your functions must share that stack with the entire kernel-space call chain. Thus, it is never a good idea to declare large automatic variables; if you need larger structures, you should allocate them dynamically at call time. No Floating Point Arithmetic Kernel code cannot do floating point arithmetic. Enabling floating point would require that the kernel save and restore the floating point processor’s state on each entry to, and exit from, kernel space—at least, on some architectures. Given that there really is no need for floating point in kernel code, the extra overhead is not worthwhile. Concerrency Module code becomes part of the kernel. Thus it is subject to the same Concurrency issues. Above the geenral concurrency sources, Linux has two other sources of concurrenct In Linux: several software abstractions (e.g kernel times) run asynchronously In Linux 2.6.X: kernels are preemptable The Sleep Assumption in Linux A common mistake made by driver programmers is to assume that concurrency is not a problem as long as a particular segment of code does not go to sleep (or “block”). Even in previous kernels (which were not preemptive), this assumption was not valid on multiprocessor systems. In 2.6, kernel code can (almost) never assume that it can hold the processor over a given stretch of code. Kernel Debugging Options Enable kernel debugging configuration options slows the system considerably and is not recommended for production kernels. Contending for printk Messages The /proc/kmesg file is a FIFO. You can read it by hand. Obviously, you can’t read messages this way if klogd or another process is already reading the same data, because you’ll contend for it. Clobbering The System With printk Messages If you want to avoid clobbering your system log with the monitoring messages from your driver, you can either specify the –f (file) option to klogd to instruct it to save messages to a specific file, or customize /etc/syslog.conf to suit your needs. Yet another possibility is to take the brute-force approach: kill klogd and verbosely print messages on an unused virtual terminal,* or issue the command cat /proc/kmsg from an unused xterm. When using a slow console device (e.g., a serial port), an excessive message rate can also slow down the system or just make it unresponsive. The kernel has provided a function that can be helpful in such cases: int printk_ratelimit(void); This function should be called before you consider printing a possibly repeating message. Only if it returns a non-zero value, should you go ahead and pint the message if (printk_ratelimit()) printk("my repeating message"); The behaviour of this function can be changed using /proc/sys/kernel/printk_ratelimit (the number of seconds to wait before re enabling logging) and /proc/sys/kernel/printk_ratelimit_burst (the number of allowed messages before rate limiting)